N-grams from Hungarian National Corpus
The national corpus of Hungarian language which is derived into five subcorpora by regional language variants, and into five subcorpora by text genres also. The subcorpus to be studied can be chosen by any combination of these. That makes the HNC an appropriate tool to study the differences not just between text genres but between language variants. HGC wishes to be a representative general-aim corpus of present-day standard Hungarian.
HGC is based on the Hungarian National Corpus with higher quality and ﬁner level of analysis and annotation (detailed morphosyntactic analysis and disambiguation with updated processing toolchain, NP chunking, Named Entity recognition, distributional analysis, built in post-processing (multilevel frequency lists, subsequent searches on previous results)). HGC is extended up to 1 gigaword treshold with extended metadata and cleared IPR.
People who looked at this resource also viewed the following: