I also wonder whether I really need the distinction between sources and sinks. Why? Because filters.
When creating the pipeline I start with a single source. Then I add up to n sinks for each source. So far, so good. Now, when iterating over the pipeline I start from the source and get the sinks for it. And here comes the ugly part: in order to continue iterating I need to check if the sink is also a source (because that is what filters are: sinks that are also sources) which makes for a very ugly instanceof check that is almost always a sign of bad design.
So, why not ditch the distinction between sources, filters, and sinks and only use filters instead? Normal filters would continue to not output any data until you have put some in, and real sources (e.g. a file source, or a network source) would simply start delivering data as soon as you ask them to.